| Drink Category | Average fat content (g) | Average sugar content (g) | Average calories |
|---|---|---|---|
| Classic Espresso Drinks | 3 | 17 | 140 |
| Coffee | 0 | 0 | 4 |
| Frappuccino Blended Coffee | 3 | 57 | 277 |
| Frappuccino Blended Creme | 2 | 48 | 233 |
| Frappuccino Light Blended Coffee | 1 | 32 | 162 |
| Shaken Iced Beverages | 0 | 26 | 114 |
| Signature Espresso Drinks | 5 | 39 | 250 |
| Smoothies | 2 | 37 | 282 |
| Tazo Tea Drinks | 3 | 30 | 177 |
Exploratory Data Analysis of Starbucks’ and Dunkin Donuts’ Nutritional Information
Data Science 1 with R (STAT 301-1)
Introduction
In this report, I am inquiring about the nutritional information of Starbucks’ and Dunkin Donuts’ food and drink items. The first dataset I am analyzing contains relevant nutritional information of all beverages available at Starbucks. The second dataset concerns all of the food items available at Starbucks. The third dataset concerns the nutritional information of all products, both food and drink items, available at Dunkin Donuts.
Motivation for Research
I chose these datasets because I normally enjoy both Starbucks’s and Dunkin Donuts’ offerings. However, it must be noted that many students and I are currently boycotting Starbucks due to the company’s decision to not support for the Palestinian people given the humanitarian crisis they are currently experiencing. While I enjoy their products, I currently will not support the company due to this issue.
Nevertheless, I have had plenty of items from both their drink and food menus and I feel knowledgeable enough to work with data concerning both. Also, I really enjoy making and trying intricate coffee and tea drinks, so I’d be happy to work with data regarding Starbucks and Dunkin Donuts. I decided to include the food information as well as a challenge to work with multiple sets of data. Working with data from both Starbucks and Dunkin Donuts allows more depth in the exploration of nutrition information.
Research Questions
The overarching research question for this EDA is what can one consume at either Starbucks or Dunkin Donuts in order to maintain a healthy, nutritious diet. There are many ways to address this question. Some initial querries are how do the calories, fat, protein and sugar vary by drink type or by food item at either retailer. To involve both food and drink items, I propose investigating which combination of food and drink items is most nutritious, at either Starbucks or Dunkin Donuts. A related question I have is which drink or food items are least nutritious and should be generally avoided. Lastly, I propose questioning how nutritious food and drinks vary from Starbucks and Dunkin Donuts, and which restaurant has the most nutritious options overall.
Data Overview and Quality
The three raw datasets were then copied into new datasets, which were then cleaned and prepared for analysis. The first raw dataset, entitled “starbucks.csv”, contains 18 variables and 242 observations, corresponding to 242 different drinks. Three of the variables contain categorical data, corresponding to drink identifiers, and the remaining 15 variables contain numerical data, corresponding to nutritional information. There are some observations (specifically, caffeine) with missing nutritional values, which contain NA instead of the actual numerical values.
The second raw dataset, stored as “starbucks_menu_nutrition_food_redo.csv”, contains 6 variables and 113 observations, corresponding to 113 different food items. One variable contains categorical data, corresponding to the food name. The remaining five variables contain numerical data, corresponding to the nutritional information, such as calories, fat, carbohydrates, fiber, and protein content. There are no missing values in this data set.
The third raw dataset, stored as “dunkindonutsnutrition.csv”, contains 13 variables and 790 observations, corresponding to 780 different products sold by Dunking Donuts. All variables are stored as character vectors. Two variables concern the item category and name, while the remaining variables concern nutritional information. There are no missing values in this data set either.
Raw dataset links:
– Starbucks drinks dataset:
https://github.com/reisanar/datasets/blob/master/starbucks.csv
– Starbucks food dataset:
https://www.kaggle.com/datasets/starbucks/starbucks-menu/
– Dunkin Donuts dataset:
https://www.kaggle.com/datasets/joebeachcapital/dunkin-donuts-nutrition
Starbucks Data Cleaning and Preparation
In the 0a_starbucks_data_preparation R script, the dataset “starbucks_menu_nutrition_food_redo.csv” was read in, copied as starbucks_food_data, and then cleaned. The first issue with the raw data was that the variable names had spaces, which could make it difficult ot select these variables. The spaces were replaced with underscores using the names() function. Also, the variable names were be made into lowercase names for consistency.
The dataset “starbucks.csv” was read in, copied as starbucks_drinks_data, and then cleaned as well. The data frame had the same issue in that the column names had uppercase letters and spaces. This was addressed using the names() function as well. Additionally, in the column names “Total Fat g”, “Dietary Fiber g”, and “Total Carbohydrates g”, the initial word was be removed so these variables would have identical names to the starbucks_food_data dataset. This made facilitated joining observations in both data frames.
In the beverage_prep column of starbucks_drink_data, the size of beverage was missing in some observations but could be deduced from the size listed of the previous (or second previous) beverage. The size was added to this column using a “for loop” to iterate through each observation and if statements to identify the size and then add it to the observation.
Additionally, the information in the beverage_prep column in starbucks_drinks_data was used to create new variables– one for milk type and one for size. First, the column milk_type was developed bt extracting the type of milk from the beverage_prep column. Drinks without any milk intentionally had NA in this column. A for loop was used to iterate through each observation in the beverage_prep column, and if statements were used to identify the type of milk and add it to the milk_type column for each observation. The column size was added by extracting the size of drink from the beverage_prep column. A for loop can was to iterate through each observation in the beverage_prep column, and if statements were used to identify the beverage size and add it to the size column for each observation.
Finally, a new data frame entitled starbucks_all_data was created by joining the two existing data frames with the merge() function. The beverage and food_item variables were renamed to item, so the data frames could share this variable. Then, the two data frames were combined by the variables item, calories, fat_g, carb_g, fiber_g, and protein_g, having selected these variables upon entering the data sets into the merge() function.
Dunkin Donuts Data Cleaning and Preparation
In the 0b_dunkin_data_preparation R script, the dataset “dunkindonutsnutrition.csv” was read in, copied as dunkin_donuts_data, and then cleaned. First, the variable named were changed with the names function to remove parentheses and spaces. Additionally, the names were made lowercase for consistency.
A new variable, item_type was created as a factor. The factor “drink” was assigned to drinks (based on the category variable) and the factor “food” was assigned to food items. Variables corresponding to nutritional information were converted into numeric vectors, as they generally contained numeric values. Moreover, a new factor variable, size, was created to identify the size of each product. The size was extracted from the item name using the grepl() function. Items that were not given a size or milk_type were assigned a factor of “not applicable”. Similarly, a new factor variable, milk_type, was created to identify the type of milk in each product, if applicable.
Finally, the data was filtered into two new datasets: dunkin_donuts_drinks_data, containing the drink observations; and dunkin_donuts_food_data, containing the food observations.
From the R scripts, the cleaned datasets were downloaded as csv files using the write_csv() function, and then they were accessed using the read_csv() function.
Explorations
Analysis of Drinks
Nutritional Content by Drink Category
Sugar, Fat, and Calories Content
The first question I sought to answer was which drinks are considered the healthiest. I initially approached this quesiton by comparing the fat, sugar, and calorie content of drinks. In these categories, a beverage with a lower amount of each of these nutrients is considered healthier. For this analysis, I compared drink categories, rather than the individual drinks themselves, as it made it easier to make generalizations about drinks.
The following table and three figures depict the fat, sugar, and calorie content of types of drinks sold by Starbucks.
ANALYZE.
The following table and three figures depict the fat, sugar, and calorie content of types of drinks sold by Dunkin Donuts.
| Drink Category | Average fat content (g) | Average sugar content (g) | Average calories |
|---|---|---|---|
| Cold Brew Coffee | 5 | 8 | 80 |
| Coolatta | 2 | 96 | 427 |
| Dunkin Refreshers | 5 | 60 | 320 |
| Frozen Coffee | 13 | 118 | 640 |
| Hot Americano | 0 | 0 | 8 |
| Hot Cappuccino | 3 | 39 | 223 |
| Hot Chocolate | 13 | 48 | 368 |
| Hot Coffee | 4 | 27 | 162 |
| Hot Latte | 6 | 43 | 274 |
| Hot Macchiato | 4 | 39 | 233 |
| Iced Americano | 0 | 0 | 8 |
| Iced Cappuccino | 3 | 39 | 223 |
| Iced Coffee | 3 | 24 | 142 |
| Iced Latte | 6 | 43 | 273 |
| Iced Macchiato | 3 | 39 | 226 |
| Iced Tea | 0 | 25 | 109 |
ANALYZE.
Protein, Calcium, and Vitamin A and C Content
I then address this questioned by investigating which beverages at each retailer had the highest amount of protein, calcium, and vitamins. In these categories, a beverage with a high content of each of these nutrients is considered more healthy.
The Starbucks drinks data contains information regarding each drink’s protein, calcium, and vitamin A and C content. The content of each of these nutrients is compared in the following table and graphs. A boxplot was used to compare protein content, similar to the other nutrients. However, frequency polygons were used to compare calcium, vitamin A, and vitamin C content due to a lack of data points other than 0%.
| Drink Category | Average protein content (g) | Average calcium (%DV) | Average vitamin A (%DV) | Average vitamin C (%DV) |
|---|---|---|---|---|
| Classic Espresso Drinks | 9 | 0 | 0 | 0 |
| Coffee | 1 | 0 | 0 | 0 |
| Frappuccino Blended Coffee | 4 | 0 | 0 | 0 |
| Frappuccino Blended Creme | 4 | 0 | 0 | 0 |
| Frappuccino Light Blended Coffee | 4 | 0 | 0 | 0 |
| Shaken Iced Beverages | 1 | 0 | 0 | 0 |
| Signature Espresso Drinks | 10 | 0 | 0 | 0 |
| Smoothies | 17 | 0 | 0 | 1 |
| Tazo Tea Drinks | 7 | 0 | 0 | 0 |
ANALYZE.
The Dunkin Donuts drinks data only contains information regarding each drink’s protein content. The protein content of each type of drink sold at Dunkin is compared in the following table and boxplot.
| Drink Category | Average protein content (g) |
|---|---|
| Cold Brew Coffee | 1 |
| Coolatta | 2 |
| Dunkin Refreshers | 3 |
| Frozen Coffee | 7 |
| Hot Americano | 0 |
| Hot Cappuccino | 7 |
| Hot Chocolate | 3 |
| Hot Coffee | 3 |
| Hot Latte | 10 |
| Hot Macchiato | 8 |
| Iced Americano | 0 |
| Iced Cappuccino | 7 |
| Iced Coffee | 3 |
| Iced Latte | 10 |
| Iced Macchiato | 7 |
| Iced Tea | 0 |
ANALYZE.
Nutritional Content by Milk Type
Another question I sought to answer is how the nutrition content of each drink varies by milk type. When people order drinks from coffee shops, there are many different ways they can customize their drink to their liking. Thus, I found it important to analyze nutrition by milk preference. For this analysis, I measured the fat, sugar, calories, and then protein of drinks having grouped them by milk type. I created boxplots to depict how these nutrients vary by each type of milk.
The following table and boxplots depict the average fat, sugar, calories, and protein for drinks of each type of milk at Starbucks. Observations with no milk were excluded from this analysis.
| Milk Type | Fat (g) | Sugar content (g) | Calories | Protein content (g) |
|---|---|---|---|---|
| 2% Milk | 6 | 31 | 218 | 10 |
| Nonfat Milk | 1 | 36 | 190 | 8 |
| Soymilk | 4 | 32 | 207 | 7 |
| Whole Milk | 5 | 56 | 284 | 4 |
| NA | 0 | 17 | 75 | 0 |
ANALYZE.
The following table and boxplots reports the average fat, sugar, calories, and protein for drinks of each type of milk at Dunkin Donuts. Observations with no milk were excluded from this analysis.
| Milk Type | Fat (g) | Sugar content (g) | Calories | Protein content (g) |
|---|---|---|---|---|
| cream | 13 | 48 | 332 | 4 |
| skim | 1 | 44 | 229 | 8 |
| whole | 7 | 44 | 283 | 8 |
Due to large overlap of boxes in the boxplots, as well as an abundance of outliers, I added density plots to better distinguish the nutritional information of each type of milk.
ANALYZE.
New metrics to assess healthiness of drinks
In the previous section, I analyzed which typs of beverage had high (or low) contents of specific nutrients. There are more nuanced ways to assess the “healthiness” of a drink. In this section, I am to incorporate multiple variables to develop new ways to analyze healthiness.
Beverages rich in protein
First, I sought to determine which types of beverages were richest in protein while lowest unhealthy quantities, such as calories and fat. I compared the ratio of protein to calories for each drink type. Nevertheless, protein content can also be compared to sugar, fat, or other quantities for different metrics.
In the following table and scatterplot, I compare the ratio of grams of protein to calories for types of drinks at Starbucks. Due to difficulty in distinguishing data points in the scatterplot, I added faceted scatterplots as well.
| Drink Category | Protein content (g) | Calories | Ratio of protein (g) to calories |
|---|---|---|---|
| Classic Espresso Drinks | 9 | 140 | 0.0643 |
| Coffee | 1 | 4 | 0.2500 |
| Frappuccino Blended Coffee | 4 | 277 | 0.0144 |
| Frappuccino Blended Creme | 4 | 233 | 0.0172 |
| Frappuccino Light Blended Coffee | 4 | 162 | 0.0247 |
| Shaken Iced Beverages | 1 | 114 | 0.0088 |
| Signature Espresso Drinks | 10 | 250 | 0.0400 |
| Smoothies | 17 | 282 | 0.0603 |
| Tazo Tea Drinks | 7 | 177 | 0.0395 |
ANALYZE
In the following table and scatterplot, I compare the same quantities for drinks at Dunkin Donuts. A single scatterplot and faceted scatterplots were included as well.
| Drink Category | Protein content (g) | Calories | Ratio of protein (g) to calories |
|---|---|---|---|
| Classic Espresso Drinks | 9 | 140 | 0.0643 |
| Coffee | 1 | 4 | 0.2500 |
| Frappuccino Blended Coffee | 4 | 277 | 0.0144 |
| Frappuccino Blended Creme | 4 | 233 | 0.0172 |
| Frappuccino Light Blended Coffee | 4 | 162 | 0.0247 |
| Shaken Iced Beverages | 1 | 114 | 0.0088 |
| Signature Espresso Drinks | 10 | 250 | 0.0400 |
| Smoothies | 17 | 282 | 0.0603 |
| Tazo Tea Drinks | 7 | 177 | 0.0395 |
ANALYZE
Beverages rich in vitamins
Another way I sought to assess how healthy beverages were was by assessing their overall vitamin and mineral contents. Unfortunately, this information was not available for the Dunkin Donuts drinks data. The Starbucks drinks data contains information regarding vitamin A, vitamin C, calcium, and iron. For this analysis, I took the sum of the listed percent daily value of each of these nutrients, and divided by four. This calculation results in an estimation of each drink type’s percent daily value of all vitamins and minerals that one should consume in a day (i.e. how much of your daily vitamin and mineral content are you getting through each drink). The following table contains information regarding the average vitamin/mineral percent daily value for each drink type and Starbucks.
| Drink Category | Vitamin A (%DV) | Vitamin C (%DV) | Calcium (%DV) | Iron (%DV) | Average %DV of Vitamins and Minerals |
|---|---|---|---|---|---|
| Classic Espresso Drinks | 0 | 0 | 0 | 0 | 0.00 |
| Coffee | 0 | 0 | 0 | 0 | 0.00 |
| Frappuccino Blended Coffee | 0 | 0 | 0 | 0 | 0.00 |
| Frappuccino Blended Creme | 0 | 0 | 0 | 0 | 0.00 |
| Frappuccino Light Blended Coffee | 0 | 0 | 0 | 0 | 0.00 |
| Shaken Iced Beverages | 0 | 0 | 0 | 0 | 0.00 |
| Signature Espresso Drinks | 0 | 0 | 0 | 0 | 0.00 |
| Smoothies | 0 | 1 | 0 | 0 | 0.25 |
| Tazo Tea Drinks | 0 | 0 | 0 | 0 | 0.00 |
ANALYZE
Analysis of food data
Similar to the drinks, I then sought to investigate which food items offered are considered the healthiest. The data regarding the beverages offered at Starbucks and Dunkin Donuts was largely similar and contained the same nutritional information. However, the two food datasets, starbucks_food_data and dunkin_donuts_food_data have far less similarities. The Starbucks food data contains less nutrient information and does not categorize the food items. The Dunkin food data is more detailed and labels the food items at Dunkin Donuts by type. Thus, for this section of the EDA, the analysis was split for the starbucks_food_data and dunkin_donuts_food_data.
Nutritional information of Starbucks food items
First, I identified the healthiest Starbucks food items in five separate categories: highest in protein, highest in fiber, lowest in fat, lowest in carbohydrates, and lowest in calories. The top five food items in each category are listed in the tables below.
| item | protein_g |
|---|---|
| Turkey Pesto Panini | 34 |
| Roasted Turkey & Dill Havarti Sandwich | 32 |
| Turkey & Havarti Sandwich | 29 |
| Za'atar Chicken & Lemon Tahini Salad | 27 |
| Chicken & Quinoa Protein Bowl with Black Beans and Greens | 27 |
| Spicy Chorizo Monterey Jack & Egg Breakfast Sandwich | 26 |
| Ancho Chipotle Chicken Panini | 26 |
| Turkey & Fire-Roasted Corn Salad | 24 |
| Smoked Turkey Protein Box | 24 |
| Slow-Roasted Ham Swiss & Egg Breakfast Sandwich | 24 |
| item | fiber_g |
|---|---|
| Lentils & Vegetable Protein Bowl with Brown Rice | 21 |
| Za'atar Chicken & Lemon Tahini Salad | 11 |
| Strawberries & Jam Sandwich | 10 |
| Green Goddess Avocado Salad | 10 |
| Chicken & Quinoa Protein Bowl with Black Beans and Greens | 9 |
| Multigrain Bagel | 8 |
| 8-Grain Roll | 7 |
| Sprouted Grain Vegan Bagel | 7 |
| Roasted Carrot & Kale Side Salad | 7 |
| Turkey & Fire-Roasted Corn Salad | 7 |
| item | fat_g |
|---|---|
| Seasonal Fruit Blend | 0.0 |
| Cinnamon Raisin Bagel | 1.0 |
| Plain Bagel | 1.5 |
| Classic Whole-Grain Oatmeal | 2.5 |
| Hearty Blueberry Oatmeal | 2.5 |
| Berry Trio Yogurt | 2.5 |
| Fresh Blueberries and Honey Greek Yogurt Parfait | 2.5 |
| Frappuccino Cookie Straw | 3.0 |
| Everybody's Favorite - Bantam Bagel (2 Pack) | 3.5 |
| Everything Bagel with Cheese | 3.5 |
| item | carb_g |
|---|---|
| Organic Avocado (Spread) | 5 |
| Justin's Classic Almond Butter | 6 |
| Cauliflower Tabbouleh Side Salad | 7 |
| Garden Greens & Shaved Parmesan Side Salad | 9 |
| Sous Vide Egg Bites: Bacon & Gruyere | 9 |
| Justin's Chocolate Hazelnut Butter | 12 |
| Sous Vide Egg Bites: Egg White & Red Pepper | 13 |
| Everybody's Favorite - Bantam Bagel (2 Pack) | 14 |
| Frappuccino Cookie Straw | 14 |
| Petite Vanilla Bean Scone | 18 |
| item | calories |
|---|---|
| Frappuccino Cookie Straw | 90 |
| Organic Avocado (Spread) | 90 |
| Seasonal Fruit Blend | 90 |
| Everybody's Favorite - Bantam Bagel (2 Pack) | 100 |
| Petite Vanilla Bean Scone | 120 |
| Cauliflower Tabbouleh Side Salad | 130 |
| Chocolate Cake Pop | 160 |
| Classic Whole-Grain Oatmeal | 160 |
| Chewy Chocolate Cookie | 170 |
| Garden Greens & Shaved Parmesan Side Salad | 170 |
Summary of major findings
In this section of the EDA, I compared nutritional information of drinks and food items at Starbucks and Dunkin Donuts.
Conclusions
References
Arvidsson, J. (2023, September) Dunkin’ Donuts’ Nutrition: Dunkin’ Donuts’ Menu Nutrition, Micronutrients, and Calorie Information. Kaggle. https://www.kaggle.com/datasets/joebeachcapital/dunkin-donuts-nutrition
Sanchez-Arias, R. (2023, October 19) Sample Datasets: A collection of datasets from multiple sources to be used for demonstrations in data science courses. GitHub. https://github.com/reisanar/datasets
Starbucks. (2017) Nutrition factors for Starbucks: Nutrition information for Starbucks menu items, including food and drinks. Kaggle. https://www.kaggle.com/datasets/starbucks/starbucks-menu/data